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In this paper, we study the ordinary backfitting and smooth back- 
fitting as methods of fitting additive quantile models. We show that 
these backfitting quantile estimators are asymptotically equivalent to 
the corresponding backfitting estimators of the additive components 
' in a specially-designed additive mean regression model. This implies 

. that the theoretical properties of the backfitting quantile estimators 

are not unlike those of backfitting mean regression estimators. We 
also assess the finite sample properties of the two backfitting quan- 



■ tile estimators. 

1. Introduction. Nonparametric additive models are powerful techniques 
for high-dimensional data. They enable us to avoid the curse of dimensional- 
ity and estimate the unknown functions in high-dimensional settings at the 
same accuracy as in univariate cases. In the mean regression setting, there 
. have been many proposals for fitting additive models. These include the ordi- 

nary backfitting procedure of Buja, Hastie and Tibshirani (1989), whose the- 
oretical properties were studied later by Opsomer and Ruppert (1997) and 
Opsomer (2000), the marginal integration technique of Linton and Nielsen 
(1995), and the smooth backfitting of Mammen, Linton and Nielsen (1999), 
Mammen and Park (2006) and Yu, Park and Mammen (2008). It is widely 
accepted that the marginal integration method still suffers from the curse 

CZ ' Received October 2009; revised January 2010. 

1 Supported by Basic Science Research Program through the National Research Foun- 
dation of Korea (NRF) funded by the Ministry of Education, Science and Technology 
(2009-0058380). 

2 Supported by DFG-project MA 1026/9-2 of the Deutsche Forschungsgemeinschaft. 
' ! Supported by Mid-career Researcher Program through NRF grant funded by the 

MEST (No. 2010-0017437). 

AMS 2000 subject classifications. Primary 62G08; secondary 62G20. 
Key words and phrases. Backfitting, nonparametric regression, quantile estimation, ad- 
ditive models. 



This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2010, Vol. 38, No. 5, 2857-2883. This reprint differs from the original in 
pagination and typographic detail. 

1 



2 



Y. K. LEE, E. MAMMEN AND B. U. PARK 



of dimensionality since it does not produce rate-optimal estimates unless 
smoothness of the regression function increases with the number of additive 
components. On the contrary, the ordinary backfitting and smooth backfit- 
ting are known to achieve the univariate optimal rate of convergence under 
certain regularity conditions. 

In this paper, we are concerned with nonpar ametric estimation of addi- 
tive conditional quantile functions. Conditional quantile estimation is also 
a very useful tool for exploring the structure of the conditional distribu- 
tion of a response given a predictor. A collection of conditional quantiles, 
when graphed, give a picture of the entire conditional distribution. It can be 
used directly to construct conditional prediction intervals. Also, it may be a 
basis for verifying the presence of conditional heteroscedasticity; see Furno 
(2004), for example. Various other applications of conditional quantile es- 
timation may be found in Yu, Lu and Stander (2003). In the nonadditive 
setting, there have been many proposals for this problem, which include the 
work by Jones and Hall (1990), Chaudhuri (1991), Yu and Jones (1998) and 
Lee, Lee and Park (2006). There have been also some proposals for additive 
quantile regression. Fan and Gijbels (1996) provided a direct extension of the 
ordinary backfitting method to quantile regression, but without discussing 
its statistical properties. Lu and Yu (2004) gave a heuristic discussion of the 
asymptotic limit of a backfitting local linear quantile estimator. Horowitz 
and Lee (2005) studied an extension of the two-stage procedure of Horowitz 
and Mammen (2004) to quantile regression. Their estimator is a one-step 
kernel smoothing iteration of an orthogonal series estimator. 

The main theme of this paper is to discuss the statistical properties of 
the ordinary and smooth backfitting methods in additive quantile regression. 
The methods are difficult to analyze since there exists no explicit definition 
for the ordinary backfitting estimator and, for both estimators, the objective 
functions defining the estimators are not differentiable. We borrow empir- 
ical process techniques to tackle the problem. In particular, we devise a 
theoretical mean regression model by using a Bahadur representation for 
the sample quantiles. We show that the least squares ordinary and smooth 
backfitting estimators in this theoretical mean regression model are asymp- 
totically equivalent to the corresponding quantile estimators in the original 
model. This makes the theoretical properties of the two backfitting quantile 
estimators well understood from the existing theory for the corresponding 
least squares backfitting mean regression estimators. The theory was con- 
firmed by a simulation study. Also, it was observed in the simulation study 
that the smooth backfitting estimator outperformed the ordinary backfitting 
estimator in additive quantile regression. 

The paper is organized as follows. In the next section, the ordinary and 
smooth backfitting methods for additive quantile regression are introduced 
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and their theoretical properties are provided. In Section 3, some computa- 
tional aspects of the smooth backfitting method are discussed. The simula- 
tion results for the finite sample properties of the two backfitting methods 
are presented in Section 4. Technical details are given in Section 5. 

2. Main results. It is assumed for one-dimensional response variables Y 1 , 
...,Y n that 

(2.1) Y i = m + m 1 (Xi) + --- + m d (X l d ) + e i , l<i<n. 

Here, e % are error variables, mi,...,m^ are unknown functions from M to 
E satisfying f m,j(xj)wj(xj) dxj = for some weight functions Wj, tjiq is 
an unknown constant, and X % = (X[, . . . ,X d ) are random design points in 
M. d . Throughout the paper, we assume that (X l ,e l ) are i.i.d. and that X % - 
takes its values in a bounded interval Ij. Furthermore, it is assumed that 
the conditional a-quantile of e l given X 1 equals zero. This model excludes 
interesting auto-regression models, but it simplifies our asymptotic analysis. 
We expect that our results can be extended to dependent observations under 
mixing conditions. 

The ordinary backfitting estimator is based on an iterative algorithm. The 
estimate of rrij is updated by the following equation: 

n / d \ 

mJ F (x ] ) = MgmmY,Ta[Y i -9-rn% F - ™f F ( X l)) 
ee0 i=i V e=i,^j J 

(2.2) 

x K jih .(xj,X)). 

Here, r Q is the so called "check function" defined by T a (u) = u{a — I(u < 0)}, 
and Kj t9 are kernel functions with bandwidth g; see the assumptions below. 
To simplify the mathematical argumentation, the minimization in (2.2) runs 
over a compact set 0. It is assumed that all values of the function rrij lie in 
the interior of 0. As in the case of mean regression, the ordinary backfitting 
estimator is not defined as a solution of a global minimization problem. 

The smooth backfitting estimator is also based on an iterative algorithm. 
The estimate of rrij is updated by the following integral equation: 



mf F (x,) = argmin^ f tJy' - 6 - m s BF - mf BF (x e ) 
0e i=i J \ , 

(2.3) 



x K eM (x £ ,X}) dx t ■ Kj.hUj. Xj] 



where the integration is over the support of (X\,. . . ,X'j_ 1 ,Xj +1 ,. . . ,X l d ). 
This is an iterative scheme for obtaining m^ BF , j = 0, 1, . . . , d, which mini- 
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mize 



n « / d \ 

E/-4^-^ BF -E-f F (^)J 

(2.4) . J=1 

x K hhl (xi,X{) ■ ■■K d ^ hd (x d ,X l d )dxi---dx d , 

where the integration is over the support of X % . The minimizations or iter- 
ations are done under the constraints 

(2.5) J m l j (x j )w j (x j )dx j = 0, j = 1, . . . , d and I = BF, SBF 



for some weight functions Wj . One may take unknown weight functions such 
as the marginal densities of Xj and use consistent estimators of them as 
the weight functions Wj in the integrals (2.5). But this would lead to more 
complicated bias calculation. 

We compare our model (2.1) with the following theoretical model. For 
i = 1, . . . , n, let Z 1 , . . . , Z n be one-dimensional variables such that 

(2.6) Z i = m + mi{X[) + ■■■ + m d {X d ) + rf . 

Here, the constant niQ, the functions mi, . . . , m d and the covariates X\, . . . , X l d 
are those in (2.1). The error variables rf are defined by 

i_ !(/ < 0) - a 

where f e \x is the conditional density of e given X. This definition is moti- 
vated from the Bahadur representation of sample quantiles [Bahadur (1966)]. 
For an independent sample of e 1 , . . . , e n with densities fi and a-quantiles be- 
ing equal to 0, the Bahadur expansion states that the crth sample quantile 
9 a of e 1 , . . . ,e n is asymptotically equivalent to the weighted average 



where rf = — {I(e l < 0) — a}/j(0) . Thus, the estimator 6 a is asymptotically 
equivalent to the minimizer of 

n 

i=i 

This consideration suggests that the ordinary and smooth backfitting es- 
timators defined at (2.2) and (2.3), respectively, may be approximated well 
by the corresponding weighted local least squares estimators in the model 
(2.6). Note that the model (2.6) is an additive model with errors rf having 
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conditional mean zero given the covariates X 1 . Thus, the weighted ordi- 
nary backfitting estimators my in this model are defined by the following 
iterations: 



2 



- *,BF/ \ ■ I rj% n - *,BF -*,BF/ V j 

m- [Xj ) = arg mm y < Z — a — m Q — m e 

eee i=l { e=i,ft 



(2.7) 



xf e \ x (0\X i )K jih .(x j ,Xj) 



3 

d 



i=l I t=l,^3 > 



n 



X 



U=i J 

Also, the weighted smooth backfitting estimators m*' SBF in the model (2.6) 
are defined by 

*,SBF/ ^ ~*,SBF, ^ - *,SBF 
l 3 



(2.8 



where 



[Xj) = m ■ [Xj) — m r 



Ef - *,SBF/ v 



dx e , 



mf BF (x i )=n- 1 ^Z7 £ |x(0|X i )^, ft .(^,Xj)/f.(x i )- 1 , 

i=l 

n 

f][.(x j )=n- 1 J2fe\x(0\X i )Kj th .(xj,X i j), 



i=l 

n 



f^ Xl {xj,x l ) = n- l Y,fe\xm t )K hh] {xj,X))K l)hl {x ll X^ 

i=l 

are weighted modifications of the marginal Nadaraya- Watson estimator and 
the kernel estimators of the one- and two-dimensional marginal densities of 
X, respectively. The latter two are in fact kernel estimators of 



fxjfrj) = J fe\x(P\x)fx(x)dX-j = fe, Xj (0,Xj), 
fx j ,X t { x ji x i) = I fe\x(0\x)f X (x)dx_ {j/ ) = fe,Xj,X e (0,Xj,Xe), 



respectively, where x_j = (x%, . . . , Xj-i,Xj+i, Xd) T and x_qa is a vector 
that has elements x\ with 1 < I < d and I ^ j, I. 
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Our first result (Proposition 2.1) shows that each application of the up- 
dating equations (2.7) and (2.8) in the theoretical model (2.6), respectively, 
lead to asymptotically equivalent results with those at (2.2) and (2.3) in 
the original model (2.1). In the next step, we will apply Proposition 2.1 
for iterative applications of the backfitting updates. We will show that the 
asymptotic equivalence remains to hold for iterative applications of the back- 
fitting procedures as long as the number of iterations is small enough. By 
extending the results for backfitting and smooth backfitting estimators in 
mean regression, we will use this fact to get our main result (Theorem 2.2). 
The latter states an asymptotic normality result for the ordinary and smooth 
backfitting quantile estimators in additive models. Its proof is based on an 
argument that carries an asymptotic normality result in mean regression 
over to quantile regression. 

We now introduce assumptions that guarantee asymptotic equivalence 
between the mean and the quantile backfitting estimators after one cycle of 
update. Further assumptions that are needed for iterative updates will be 
given after Proposition 2.1. For simplicity, we state Proposition 2.1 and its 
conditions only for the updates of the first additive component. In abuse of 
notation, we denote the estimators of the components mj, 2 < j <d, at the 
preceding iteration step, by rh l 2 , . ■ ■ ,in l d , where I stands for BF, SBF, *,BF 
or *, SBF. The updates of the first component that are obtained by plugging 
these estimators into (2.2), (2.3), (2.7) and (2.8), respectively, are denoted 
by m BF , m^ BF , m^' BF and m*' BF . Thus, for simplicity of notation, we use 
the same kind of symbol for the updates (j = 1) and for the inputs of the 
backfitting algorithms (2 < j < d). 

We make the following assumptions: 

(Al) The d-dimensional vector X 1 has compact support I = I\ x • • • x 

for bounded intervals Ij = [a,j,bj] and its density fx is continuous and 
strictly positive on /. 

(A2) There exist constants Ck-.Cs > such that for all Xj G Ij, 1 < j < d, 
the kernels Kj >g (xj, •) are positive, bounded by Ck9~ 1 , have bounded 
support C [x — Csg, x + Csg] , and are Lipschitz continuous with Lips- 
chitz constant bounded by Ck9~ 2 ■ The weight functions Wj are bounded 
functions with Wj(xj) > for Xj € Ij and f Wj(xj) dxj > 0. 

(A3) The conditional density f £ \x(®\ x ) of e given X = x is bounded away 
from zero and infinity for x El. Furthermore, it satisfies the following 
uniform Lipschitz condition: 

IA|Jf(e|a:)-/ e |x(0|a:)|<Ci|e| 

for x € I and for e in a neighborhood of with a constant C\ > that 
does not depend on x. 
(A4) The bandwidths hi,...,hd are of order n -1 / 5 . 
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Assumptions (Al)-(A4) are standard smoothing assumptions. In partic- 
ular, (A2) is fulfilled for convolution kernels with an appropriate boundary 
correction. 

For the properties of the updated estimators, the estimators at the pre- 
ceding iteration step need to fulfill certain regularity conditions. We will 
proceed with the following assumptions that are stated for some constants 
0<p< 1, Ai,A 2 ,A 3 >0 and < f < (1 + p)A ± . 

(A5) For j = 2, . . . , d, it holds for I = BF and I = SBF that 

mj(Xj)\ = O p ( n -(4+4p)/(10+15p)-A 1); 

mj ( Xj )\ = Op ( n -[(4+4p)/(10+15p)-A 1 ]/2 ) _ 

(A6) There exist random functions g2, ■ ■ ■ , gd with derivatives that fulfill the 
Lipschitz condition 

W^Xj) - g' J {x])\<C\x j - x]\Pn^ 

for j = 2, . . . , d and xj , x* € Ij . Furthermore, these functions satisfy 

sup Imfoj) - gj {xj)\ = P (n- 2 / 5 " A2 ) 

aj <Xj <bj 

for I = BF and I = SBF. 
(A7) For j = 2, . . . , d, it holds for I = BF and I = SBF that 

sup Irh^Xj) - mf{xj)\ = P {n~ 2 / 5 ~ A ^), 

a j +Cshj <Xj <bj —Cshj 

sup \m\ixj) - mf{xj)\ = P {n- 1 ^-^). 

aj<Xj<bj 

We briefly comment on the assumptions (A5)-(A7). A more detailed dis- 
cussion is given after Theorem 2.2. Assumption (A5) requires suboptimal 
rates for the preceding estimators that are plugged in for the update of 
the first component. Assumption (A6) states that the class of possible re- 
alizations of the preceding estimators is not too rich. We assume that the 
preceding estimators are in a neighborhood of the class of functions with 
Lipschitz continuous derivatives. Other classes could be used but for a Lips- 
chitz class it is relatively easy to check if a function belongs to it. Note that 
we do not assume that the quantile estimator itself has a smooth deriva- 
tive. In general, such an assumption does not hold because quantile kernel 
estimators are not smooth. Assumption (A7) is very natural. It states that 
the estimators that are plugged into the updating equation of the quantile 
model and of the mean regression model differ only by second order terms. 



sup \rhj(xj) — 

aj +Cghj <Xj <bj —Cs hj 

sup \m l j(Xj) — 
aj<Xj<bj 
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Without this assumption, it cannot be expected that the updated estimators 
differ also only by second order terms. We will see below that this assump- 
tion is automatically fulfilled if we apply Proposition 2.1 for an analysis of 
iterative applications of the backfitting algorithms. In the assumptions (A5) 
and (A7), if one replaces the interior region [a,j +Cshj,bj — Cshj] by the 
whole range [cij,&j] and if one uses boundary corrected kernels, then one 
can also replace in Proposition 2.1 the suprema over the interior region by 
those over the whole range, and the estimators achieve the rate n _2//5 at the 
boundary, too. 

Proposition 2.1. Under the assumptions (Al)-(A7), it holds for the 
updated estimators with I = BF and with I = SBF that for some 5 > 

sup \rh[(xi) - m/( Xl )\ = P (rT 2 / 5 - 5 ), 

ai+Cghi <xi <bi— Cghi 

sup \m[(xi) - m{ l ( Xl )\ = Op^ 1 ^ 5 ). 

ai<xi<bi 

The additional factor n~ s allows an iterative application of the propo- 
sition. This has an important implication. We recall that the backfitting 
algorithms for mean regression have a geometric rate of convergence. In 
particular, in the case of smooth backfitting, only square integrability for 
the initial estimator is required for the algorithm to achieve the geometric 
rate of convergence, see Theorem 1 of Mammen, Linton and Nielsen (1999). 

Suppose one chooses square integrable functions, say m B , . . . , m BF '^ ' as 
the starting value in the algorithm for the backfitting quantile estimator 
and that one runs a cycle of backfitting iterations (2.2) for j = l,...,d. 

Then we get updates m^'''', . . . ,m B with / = 1 and after further cycles 
with I > 1. (Note that by construction of the backfitting estimator we do 

BF fol 

not need a pilot version of m 1 .) Then, one can think of running the 
backfitting mean regression algorithm (2.7) with the same initial estima- 
tors m BF '^, • • • , m BF '^ in parallel with the backfitting quantile regression 

algorithm (2.2). This results in updates jrij' 8 ^''',...,^'^'''' for I > 1. In 
the proof of our next theorem, we will see that after I cycles of the two 
parallel iterations, the difference m? — m^' BF ' is of order Op(n~ 2 ^ 5 ~ s ) 

in the interior, and of order Op(n -1 / 5-<5 ) at the boundaries. This holds as 
long as I < Citgdogn with Cit cr small enough. On the other hand, we will 
show that m*' ltorlogn ] i s asymptotically equivalent to the limit of the 

backfitting algorithm m*' BF ^°°', if Cit or is large enough. If the pilot esti- 
mators m BF '^, . . . ,m BF '^ are accurate enough, then the constant Cit er can 
be chosen such that both requirements are fulfilled. This will allow us to 
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get the asymptotic limit distribution of 7 f 1 ,*' BF '[ c ' itcrlogn l j anc j thus that of 

- BF,[C itcr logn] 

Similar findings also hold for the smooth backfitting estimator. We denote 

the starting values by m^'' ' , . . . , ?n^ BF '^ ' and the updates by > • • • > 

* SBFJi] « *,SBF,m « *,SBF,m , 

m d or ?n 2 , • • • , m d , respectively. 

The following theorem summarizes our discussion. For the theorem, we 

need the following additional assumptions: 

(A8) There exist constants ck,Ce> > 0, C' s > such that for aj + C' s hj < 
Xj,Uj < bj — C' s hj it holds that Kj^.{xj,Uj) = hJ 1 K[h~ 1 (xj — uj)] for 
a function K with J K(v) dv = 1 and J vK(v) dv = 0. For all Xj,Uj € ij, 
1 < j < cZ, the kernels Kj t9 (xj,Uj) have a second derivative w.r.t. Xj 
that is bounded by Cd9~ 3 and they fulfill J Kj t9 (xj,Vj) dvj > ck and 
f K j:g {v j ,u j )dv j = l. 

(A9) The function fx k \x X x k\ x j) — fxj x h ( x ji x k)/ fxj( x j) nas a secon d 
derivative w.r.t. xj that is bounded over Xj Elj, x^ G i^, 1 < j, k < d, 

The last condition in (A8) implies that the one-dimensional kernel den- 
sity estimators integrate to one and that they are equal to the correspond- 
ing marginalization of higher-dimensional product-kernel density estimators. 
This assumption simplifies bias calculation of the backfitting estimators. 

Theorem 2.2. Assume that (Al)-(A4), (A8) and (A9) hold, and that 
(A5) and (A6) are satisfied by m BF = m BF '^ and ?n^ BF = to^ BF '^ (j = 
2,...,d) with £, A2, A3, 1 — 2+3^ I — ^1 > small enough. Then, we get for 
^,iter _ ^[dterlogn] w ^ a ^ appropriate choice of C\t cv = Cit er ,i = BF and 
Z = SBF ) that for a,- < Xj < bj 

^nhj[mj lteI (xj) - mj(xj) - hjj3j(xj)) 

-+n(o, t a (\~ a \ 9 fxAzj) f K\u)du 



fcx^x 



3 , 



in distribution, where Pj(xj) = /3j(xj) — J /3* (uj)wj(iij) duj, (3*(xj) = h- 2 x 
m'jixj) j(uj - Xj)K j)h .(xj,Uj)duj + H2,K\m'-(xj) + (j, 2 ,KPj*(xj), fJQ,K = 
J v 2 K(v)dv and (/?**, . ■ . , /3*j*) is a tuple of functions that minimizes 



2 

f £ ,x{0,x)dx. 
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Note that the first term in the definition of j3* is of order n 1//5 at the 
boundary but vanishes in the interior of Ij . Because of the norming with the 
weight function wj, the bias function (3j is shifted from /3* by J /3j(uj)wj(v,j) duj. 
One can estimate the bias and the variance terms because they only require 
two-dimensional objects if one calculates them with the backfitting algo- 
rithms. 

We now come back to discussion of the assumptions (A5)-(A7). Assump- 
tion (A5) allows that the starting estimators have a suboptimal rate. In 
particular, it requires that the starting estimators are consistent. For exam- 
ple, one could use here orthogonal series estimators, smoothing splines or 
sieve estimators. In the simulations, we got good results by using constant 
functions as starting values, that is, functions that are not consistent. For 
backfitting mean regression, it is known that every starting value works. Be- 
cause of the nonlinearity of quantile regression, we do not expect that such a 
result can be proved for quantile regression. In our result, we did not specify 
the required rate for the pilot estimator. But, if one does this, we conjecture 
that one can get the statement of Theorem 2.2 with pilot estimators that 
have much slower rates. For such a theorem, one has to prove a modifica- 
tion of Proposition 2.1 with the following statement: for the estimators at 
the preceding stage of the backfitting algorithms, less accurate error bounds 
would suffice to get that the difference between the backfitting estimators 
rh\ and rh\ at the current stage of the algorithm is of higher order than the 
accuracy of the preceding estimators. This would allow one to weaken the 
assumptions on the rate of the starting estimators. 

Assumption (A7) is not required for Theorem 2.2. This is because run- 
ning the iterative algorithms (2.7) and (2.8) is only imaginary and in the 
proof we choose to use the same starting values as in the real iterative al- 
gorithms (2.2) and (2.3), respectively. Thus, (A7) is automatically satisfied 
at the beginning of the iterations. Proposition 2.1 tells us that the updated 
estimators also fulfill (A7). This holds with the same rate but with mul- 
tiplicative factors. For this reason, after L backfitting cycles the difference 
between the mean regression and the quantile regression estimators is not of 
order (C x L)n -2 / 5-5 , but of order C L n~ 2 / 5 ~ 5 , for some 5 > 0, C > 1. For 
a number of iterations, Cit er logn such that Cit er logC < 5 this is of order 
o(n~ 2 / 5 ). 

Compared with the results for mean regression backfitting estimators, our 
results for quantile estimation are weaker in two aspects. First, we need ini- 
tial estimators that are consistent, whereas in mean regression one can start 
with arbitrary initial values. This restriction comes from the nonlinearity of 
the quantile functional. Second, we put restrictions on the number of itera- 
tion steps. It must be of logarithmic order with a factor that is not too small 
and not too large. When letting run the two parallel backfitting procedures 
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for mean and quantile regression, we were not able to control in the proof the 
difference between the two outcomes if the number of iterations is too large. 
We conjecture that both restrictions are necessary only for technical reasons 
in our approach for the proof. In our simulation, we started with noncon- 
sistent pilot estimators and we let the algorithms run until the outcomes 
were stabilized. According to our experience in the simulation, there seemed 
practically no advantage in limiting the number of iterations and there was 
also no problem when starting the algorithm with initial estimators that 
were far away from the corresponding underlying regression functions. 

A natural extension of our results is to study local polynomial quantile 
estimators. This can be done along the lines of this paper by putting smooth- 
ness restrictions also on the higher order terms of the local polynomial fit. 
This can be done relatively easily for local polynomial smooth backfitting. 
For local polynomial ordinary backfitting, it would require also essentially 
new theoretical results for mean regression. We do not follow this line in this 
paper. 

3. Numerical implementation. In practical implementations of the smooth 
backfitting method, one may approximate the integral at (2.3) by Monte 
Carlo integration. This can be done in several ways. In one version, one gen- 
erates (C/ij, . ■ . ,Ui) for l<j<M from a (d— l)-variate uniform distribution 
on I2 x • • • x I d . Then an approximation of mf BF (xi) may be obtained by 

n M 

mf F ( X1 ) » argmin£5> Q (Y, - 9 - m SBF - mf^) mf F (^')) 

xK 1M ( Xl ,X{)K 2M (U^X l 2 )--.K djhd (U 3 d ,X l d ). 

In practical implementation, the values U 3 k can be chosen from a finite grid 
of equidistant points. Then the algorithm has to update the function values 
of the additive components on this grid. 

In another version, one generates independent Ugij for I = 2, . . . , d, i = 
1, . . . , n, j = 1, . . . , J, where has density K^^-^Xl). Again, in practical 
implementation, the values of these random variables can be chosen from a 
finite grid of equidistant points. Then the smooth backfitting estimator at 
x\ is calculated by 

n J 

mf BF ( Xl ) « argmin V V r Q (F, - 9 - mjp F - mf F (*7 2 ,^) 
i=lj=1 

mf F (U d>i>j ))K 1M ( Xl ,Xi). 

This means that the smooth backfitting estimator can be calculated by an 
algorithm that is designed for the ordinary backfitting with sample (Yi,X\, 
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U2,i,j, ■ ■ ■ , Ud,i,j) for i = 1, . . . , n and j = 1, . . . , J. In this case, the speed of 
the algorithm for the smooth backfitting behaves as that for the ordinary 
backfitting with sample size Jn. 

In the last algorithm, the values t/g,^ could be replaced by deterministic 
choices such that for fixed i and I the probability density i£^/^(-,A^) put 
equal mass between neighbored points of Ug^j, that is, 




K eM (x e , X\) dx e = j/(J + 1), j = l,..., J. 



Suppose that K^h t {-,z) is symmetric about z. Then the algorithm calculates 
the ordinary backfitting estimates when J = 1, since in that case U^n = X\. 
It also approximates the smooth backfitting estimates as J — > oo. Thus, there 
exists a broad band of compromises between the ordinary backfitting and 
the smooth backfitting for intermediate choices of J. 

4. Simulation study. In this section, we illustrate the asymptotic equiv- 
alence asserted in Proposition 2.1. We compared the numerical properties of 
the ordinary backfitting (BF) and the smooth backfitting (SBF) estimators 
defined at (2.2) and (2.3) with their theoretical mean regression versions 
defined at (2.7) and (2.8), respectively. 

In the simulation, we considered the following model: 

Y* = h(X\) + f 2 (Xi) + h(X\) + {^(Xl) + a 2 (Xi) + a 3 (Xl)}U\ 

where W are i.i.d. N(0,1), fi(x{) = x\, f 2 {x 2 ) = sin(7rx 2 ), h{x?) = 2 x 
exp(— I6X3), ai(xi) = cos(xi), (J 2 (x 2 ) = exp(x 2 ) and ^^(xs) = exp(xs). With 
this model, the centered version of the jth additive component of the a- 
quantile function equals 

mj(xj;a) = Cj + fj(xj) + aj(x j )^~ 1 (a), 

where $ _1 (a) is the a-quantile of the standard normal distribution and Cj is 
the constant that makes Errij(Xj;a) = 0. We considered two different cases 
for the distribution of X % . One was the case where the components of X % were 
independent. In this case, X 1 were generated from N^O, J) truncated outside 
[—1, l] 3 , where J denotes the identity matrix of dimension d = 3. This means 
the density of X % was fx(%) = (f(x)I(x S [—1, l] 3 )/ Jr j ^3 tp(z) dz, where tp 
denotes the density function of Ns(0, J). The second was the case where the 
components of X % were correlated. In this case, X % ~ -^(0, V) truncated 
outside [—1, l] 3 , where V = (%) has va = 1 and Vij = 0.9 for ij^j. Because 
of the truncation, the actual correlation equals 0.644. The sample sizes were 
n = 200 and n = 500. These relatively large sample sizes were considered to 
let the asymptotic results in Section 2 be well in effect. 
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Table 1 

Mean integrated squared errors of the estimators 



Sample Distribution 



size of X 


Method 


a = 0.2 


a = 0.5 


a = 0.8 


n = 200 Uncorrel. 


BF 


0.09345 


0.07457 


0.08770 




BF* 


0.09585 


0.07512 


0.08208 




SBF 


0.08818 


0.07039 


0.08209 




SBF* 


0.09436 


0.07455 


0.07937 


Correl. 


BF 


0.09043 


0.07165 


0.08382 




BF* 


0.09864 


0.07539 


0.08276 




SBF 


0.08555 


0.06712 


0.07937 




SBF* 


0.09136 


0.07140 


0.08412 


n — 500 Uncorrel. 


BF 


0.05240 


0.04020 


0.04881 




BF* 


0.04959 


0.04121 


0.04729 




SBF 


0.04905 


0.03827 


0.04557 




SBF* 


0.05045 


0.04178 


0.04896 


Correl. 


BF 


0.05463 


0.04182 


0.05094 




BF* 


0.05137 


0.04305 


0.05312 




SBF 


0.05186 


0.03983 


0.04743 




SBF* 


0.05496 


0.04221 


0.05296 


Note: BF* denotes the theoretical mean regression ordinary backfitting estimator, and 
SBF* denotes the theoretical mean regression smooth backfitting estimator. 



Implementation of the ordinary and smooth backfitting methods requires 
optimization involving the nonsmooth function r a . For this, we used R func- 
tion rq() in the library quantreg. For the smooth backfitting, we discretized 
the integrals on a fine grid in [— 1, l] 3 . We used 

(4.1) K j Jx,u) = 

where K is Epanechinikov kernel given by K{u) = (3/4)(l — u )/r_i i] (u). For 
the bandwidths, we took h± = h2 = hs = h for simplicity. Normalization was 
done in each iteration so that J rhj(xj)fxj(xj)dxj = 0. Note that we used 
estimates of fxj in the normalization, instead of fixed weight functions which 
we considered in our theoretical development for simplicity. Using a different 
weight function changes the estimator only by an additive constant. To get 
the density estimates fxj , we used the same kernel K and the bandwidth h 
that we employed for quantile estimation. We chose the initial estimates in 
the iterative algorithms (2.2), (2.3), (2.7) and (2.8) to be zero. It was found 
that the algorithms converged with this initial choice in all cases. 
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Table 1 show Monte Carlo estimates, based on 200 pseudo-samples, of the 
mean integrated squared errors, 

MISE = E /{mi(xi) +fh 2 (x 2 ) + m 3 (x 3 ) -mi(xi ) -m 2 (x 2 ) -7713(3:3 )} 2 fx (x) dx, 

where fx is the density function of X 1 , and fhj represents 777 BF , rh^ BF , 
77i*' BF or 77i*' SBF . For each estimator, its MISE was estimated by ISE = 
Sr=°i ISE r /200, where ISE r is the value of the integrated squared error 

{fhi(xi) + m 2 (x 2 ) + m 3 (x 3 ) - mi(x{) - m 2 (x 2 ) - m 3 (x 3 )} 2 f x (x) dx 



for the rth sample. We computed the estimates of the additive regression 
function with bandwidths on a grid in [0.1,1.5]. The values for ?n BF and 
77i*' BF reported in the table are for the bandwidths that gave optimal per- 
formance of T7i BF , and likewise those for 771 and tti*' are for the band- 
widths that gave optimal performance of 777 SBF . In most cases, the estimated 
MISE was minimized around h = 0.5 when 71 = 200, and around h = 0.4 
when n = 500. This is roughly consistent with the theory that the size of 
the optimal bandwidth equals n -1 / 5 for univariate smoothing, according to 
which the ratio of the optimal bandwidths for ?7 = 200 and n = 500 equals 
(500/200) 1 / 5 « 1.20. 

To compare tti bf and 777 SBF with their theoretical mean regression coun- 
terparts 777*' BF and 77i*' SBF , we find that the two corresponding MISE values 
are very close, and that in most cases the differences get smaller as 77 in- 
creases. This supports our theory presented in Section 2. In the table, we 
also find that the size of the estimated MISE for 77 = 500 is nearly half of the 
corresponding value for 77 = 200. This supports the fact that the ordinary 
and smooth backfitting estimators enjoy the univariate rate of convergence 
re" 4 / 5 in MISE, since (500/200) 4 / 5 « 2.08. 

According to Table 1, the MISE values of the estimators at a = 0.5 are 
always smaller than those at a = 0.2 and a = 0.8. Note that, in Theorem 2.2, 
fx.(xj) is nothing else than the joint density of (e,Xj) at the point (0,Xj). 
Under our simulation model, the conditional density can be expressed as 



1 / $- x (a) 



01 (zi) + 0-2(2:2) + 0-3(^3) V^iC^i) + 02(^2) + 03OE3) 
x fx(x)dx-j 



for j = 1, 2 and 3, where <f> denotes the density of the standard normal dis- 
tribution. According to Theorem 2.2, this implies that the theoretical value 
of the integrated variance increases as a gets away from 0.5. This explains 
why we have larger MISE values for a away from 0.5. Similar numerical 
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-3-2-1 1 2 

BF 



o 




-3-2-1 1 2 3 

SBF 

Fig. 1. Normal Q-Q plots for mf F (x) and rh^ B¥ (x) based on 200 values computed from 
pseudo-samples in the case where x = 0, a = 0.5, n — 200 and the components of X 1 were 
correlated. The theoretical quantiles are on the horizontal axis and the sample quantiles 
are on the vertical axis. 
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evidences were also observed by Yu and Jones (1998) and Lee, Lee and Park 
(2006). 

Figure 1 illustrates the asymptotic normality of m BF and rhj F . It depicts 
the normal Q-Q plots of the 200 values of m BF (x) and m2 BF (x) at x = 
when a = 0.5 and n = 200. The figure is for the case where the components 
of X % are correlated. Although it exhibits slight departures from normality 
at tails, the figure suggests that the distributions of the estimators get close 
to normal even for moderate sample sizes. We obtained other Q-Q plots 
that corresponded to other components j, other points x or other quantile 
levels a, and also repeated them in other simulation models. They looked 
not much different from the case we report here. 

Figure 2 illustrates how the four curve estimates m BF , m*' BF , m^ BF and 

rh*' BF computed from a single typical sample look like. In the top two 

panels, the long-dashed and dotted curves, respectively, represent m BF and 

m*' BF computed from a sample for which the value of the integrated squared 
error 

J {to bf (xj) — rrij(xj)} 2 dxj 

was the median of those values obtained from the 200 gseudo-samples. Sim- 
ilarly, the bottom two panels depict m^ BF and "ij' S F computed from a 
sample that gave the median performance in terms of the integrated squared 
error 

J {rh£j BF (xj) — rrij(xj)} 2 dxj. 

In the figure the solid curves represent the true functions. In comparison 
of the pairs, m BF versus m*' BF and m^ BF versus ?n*' SBF , we find that the 
two corresponding curves move together relatively closer than with the true 
function, although there are some places where they are more distant in the 
case of the backfitting estimator for a = 0.2 (top left panel). The figure is 
for the estimates of the second component function when n = 500 and the 
components of X 1 were correlated. Those for other cases gave similar lesson, 
so that are not included here. 

One may be also interested in comparing the two backfitting quantile 
estimators m BF and m SBF in terms of MISE. For this, we computed the 
standard errors of the differences between the estimated values of MISE 
of the respective estimators. In Table 2, we provide the average differences 
DIFF and their standard errors calculated by the formula 

200 

^(DIFF r -DlFF) 2 /(199 x 200), 

r=l 
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H 1 1 1 r 1 n 1 1 1 r 

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 

BF (alpha = 0.2) BF (alpha = 0.5) 




n 1 1 1 r 1 n 1 1 1 r 

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 

SBF (alpha = 0.2) SBF (alpha = 0.5) 



Fig. 2. Estimates of a component function computed from a sample that gave the median 
performance in terms of the integrated squared error of rhf F or m| BF , when n = 500 and 
the covariates were correlated. Long-dashed and dotted curves in the top two panels are 
mf F and m*' BF , respectively, and those in the bottom two panels are m| BF and rfi*' SBF . 
Left two panels are for the case a — 0.2 and the right are for a — 0.5. Solid curves represent 
the true component functions. 



where DIFF denotes the average of DIFF r over 200 pseudo-samples, and 

DIFF r = (ISE of m BF for the rth sample) - (ISE of m SBF for the rth sample). 

Comparing the two backfitting quantile estimators, we find that the smooth 
backfitting estimators have smaller values of the estimated MISE in all cases 
than the ordinary backfitting estimators. In particular, all the differences 
are statistically significant, exceeding two standard errors. Although not re- 
ported in the paper, we also compared the two backfitting quantile estima- 
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Table 2 

Differences in mean integrated squared errors of BF and SBF estimators 



Sample Distribution 



size 


of X 


a = 0.2 


a = 0.5 


a = 0.8 


n = 200 


Uncorrel. 


0.00527 


0.00418 


0.00561 






(0.00099) 


(0.00063) 


(0.00087) 




Correl. 


0.00488 


0.00453 


0.00445 






(0.00098) 


(0.00068) 


(0.00096) 


n = 500 


Uncorrel. 


0.00335 


0.00193 


0.00324 






(0.00045) 


(0.00028) 


(0.00038) 




Correl. 


0.00277 


0.00199 


0.00351 






(0.00042) 


(0.00034) 


(0.00042) 


Note: the numbers 


are averag 


;es of (ISE of m BF ) - (ISE of 


m SBF ) over 200 pseud. 


3-samples, 



and their standard errors are given in the parentheses. 

tors with their oracle versions. An oracle estimator of an additive component 
is the one obtained by using true functions for the other components. We 
found that in all cases the two backfitting quantile estimators had similar 
performance as their oracle versions. 

5. Proofs. 

5.1. Proof of Proposition 2.1. We only give the proof for the ordinary 
backfitting etimator. The proof will be given for a\ + Cgn" 1 / 5 < x\ < b± — 
C^n" 1 / 5 . The proofs for the smooth backfitting estimator and for boundary 
points follow by similar arguments. For simplicity of notation, we also assume 
that d = 2. 

The basic asymptotic argument for a treatment of parametric and non- 
parametric quantile estimators is a Bahadur expansion. It states that the 
quantile estimator is asymptotically equivalent to a linear statistic, that 
is, to a sum of independent variables. This expansion would directly carry 
over to our case if the pilot functions (input) of the backfitting algorithms 
would be nonrandom. Because this is not the case, we have to generalize 
the Bahadur approach. We have to show that the Bahadur expansion holds 
uniformly over a class of pilot functions. Furthermore, we have to verify that 
the pilot estimators lie in this function class with probability tending to one. 
The latter is guaranteed by the assumptions (A5) and (A6). The uniform 
expansion is the main step of our proof. 

Define 

Vi(0,H2,x\) 

= K l!hl (x 1 ,Xi)[r a {Y i -9-n 2 {Xi))- T a (e* + mi{X\) - mi(xi)) 



ADDITIVE QUANTILE MODELS 19 

-(9-m 1 {x 1 ) + ti 2 {Xi)-m 2 (Xi)) 

x (/(e i + mi(Xj)-mi(xi)<0)-a)]. 

Let J\ = J\{x\) and J 2 = J 2 {x\) be index sets denned by 

Ji = {i: \X\ - xi I < Chi,a 2 + C s n^ b <X l 2 <b 2 - C^ 1 / 5 }, 

J 2 = {i:\X{- Xl \< Ch x ,a 2 <X l 2 < a 2 + C s n~ 1/5 or b 2 - C s n~ l / b <X l 2 < b 2 }. 

Put 

D(6»,/i 2 ,xi) 

n 

= Y J [Vi(e,^ 2 ,x 1 )-E x v i (e, f i 2 ,x 1 )} 

i=l 

= J2[Vi(0,fJ-2,Xl) ~ E*Vi(e, 1*2,1!)] 

+ Y,[Vi{e^ 2 ,xy)-E x v i {e^ 2 ,x 1 )] 

= D x (9, fi 2 , Xl ) + D 2 {6,n 2 ,xi), 

where E x is the conditional expectation given X = {X 1 , . . . ,X n }. Let M\ 
and M 2 denote the numbers of elements of J\ and J 2 , respectively. These are 
random variables. Since h\ is of order ?i _1//5 and the density fx is strictly 
positive on its support, M\ is of order n x ra -1 / 5 = n 4//5 and M 2 is of order 
n x n -1 / 5 x ?i -1 / 5 = n 3 / 5 . Thus, there exist constants C\ > and C 2 > such 
that Cin 4 / 5 < Mi < 2Ci?i 4 / 5 and C 2 n 3 / 5 < M 2 < 2C 2 n 3 / 5 with probability 
tending to one. 

For a fixed constant D > 0, we now introduce the class .M n of all tuples 
of a parameter 9 £ and a function 5 that fulfills 

sup | 5 (x 2 ) - m 2 (x 2 )| < £) n -(l+P)/(2+3p)4/5-A 1 

a2+C s n- 1 /5< X2 <fe 2 _c sn -i/5 

and whose derivative fulfills a Lipschitz condition of order p with Lipschitz 
constant C as in (A6). 

For j > 0, let A^ n (2 _J ) denote a grid of points in Ai n such that for 
every (9,g) € M n there exists (9*,g*) G A4 n (2"J) with \9* - 9\ < 2T' and 
\\g* — 5 1 loo < 2 _: '. Let iVj denote the number of points in the grid M n {2~ :) ). 
Note that Nj = 0{exp(2^^+^n^^+P^)}. 

We apply the Bernstein inequality. For a sum of r independent random 
variables Vi that are absolutely bounded by a constant k and have finite 
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variance bounded by a 2 , this inequality states that 



P 



-1/2 



> a I < 2 exp 



2aKr-V2 + 2a 2 



< 2exp( 



V 4Kr _1 /2 



+ 2 exp 



4a 2 



We apply this inequality with a chaining argument for Di(9, /x, xi) and D2(0, 
fi,xi). In doing this, we take r = M\ (or r = M2, resp.) and P = P x where 
P x is the conditional distribution given X = {X 1 , . . . , X n }. Let J n be chosen 
so that 2~ Jn < n ~ 2 / 5 ~ s < 2~ Jn+1 with 5 > small enough, see below. Define 
7 = 4(1 + /9)/ [5(2 + 3p)] and/ n = {j:i< J n ,Dn- 7 " Al > 2 - *}. Furthermore, 
for (0,/i) G 7W n (2~ J ") choose (0 J ',/i J ) G 7W„(2^') with \& -9\< 2~' and 
WfJ? — mIIcxj < 2 _J . For j = J n , we choose //■?') = (6*,//). We do not indicate 
the dependence of (6 J ,fi J ) on (0,/x) in the notation. For j < j n = min/ n , 
the grid A4 n (2~ J ) can be chosen so that it contains only one value of [i. 
We assume that this value is equal to fjP = mi- Furthermore, we choose 
9° = mi(xi) and we assume w.l.o.g. that the diameter of is less than one. 
For j = 0, the grid A4 n (2~ :J ) contains only one value which we choose to be 
(0°,/i°)- Th en 



P[ sup \Di(0,fjb,x{) 



> n 



-4/5-2(5 1 p£ 



< P l sup 

<,fj,)£M n (2-Jn) 



l>i(0VVi) 

l<3<3n 

+ £ D 1 (ffl,^,x 1 )-D 1 (ffl-\ f j.j- 1 ,x 1 ) 

jn<j<Jn 

>n -4/5-2«5^ 



Let Sj be positive numbers (depending on n) such that Si<7<j„ s i — V^- 
Then the right-hand side of the above inequality is bounded by 



P(\D 1 (6 , f ,°,x 1 )\>2- 1 



n 



-4/5-2(5 



\X) 



+ 



l<j<jn 



(5.1) 



2 2 ^sup J P(|£ 1 (#Wi) 

- L>i(0 J - Wl)l > Sjn" 4 / 5 - 25 !^) 
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V N j N j - 1 sapP(\D 1 (e i ,i*',x 1 ) 

jn<i<Jn 



D 1 (^- 1 ,^- 1 ,x 1 )|> Sj n- 4 / 5 - 2<5 |A'), 
where sup* and sup** runs over all (fl- 7 ,/^) <G 7\4 n (2~- J ) and (fl-? -1 ,/^ ) € 



A"t n (2^ +1 ) with |0J - 9 j ' 1 \ < 2-'J +1 and \\^ - //' _1 ||oo < 2" i+1 . 

Using the Bernstein inequality with At = 0(2 _: '7i^ 1 ), cr 2 = 2 _2 - J C'(n _7 ~ Al x 

/ij" 2 ) and a = M 1 1 ^ 2 ns :; -ra _4//5_2 ' 5 c, the last sum in (5.1) can be bounded by 
^ [exp(di2^( 1+ ")n«/( 1+ ") -^^nn" 4 / 5 " 2 ^^!) 

+ exp( ( i 1 2^(i+P)n«/( 1+ '') - ( i 2S 2 M 1 - 1 n 2 n- 8 / 5 - 45 2 2 ^i 7+Al /i 2 )] 



(5.2) 



for some constants d\,d 2 > 0. Choosing Sj = (d^ logn) 1 with ds large enough, 
the sum at (5.2) can be bounded further by 

exp(-d A n d5 ) + exp(-d 6 Mf 1 n 4 / 5+d7 ), 

where d±, . . . , dj > are some constants. Here, we used that 5 > is small 
enough. Using similar arguments for the first two terms in (5.1), one can 
bound the sum of all three terms in (5.1) by 

exp(-d 8 n d9 ), 

where ds,dg > are some constants. This exponential bound entails that for 
5 > small enough 

sup |n _1 Di(0,/i 2 ,xi)| 



(5.3) 



sup 

(0,Ai 2 )e.Mn(2- J ») 
xieh 



V 



_1 5^(^2,Xl)-£*U 4 (^ 2 ,Xl)} 



igJi 



= P {n-^- & ). 

Similarly, it can be shown that 

sup \n~ 1 D 2 (0,H2,x 1 )\ 
(0,M2)e.A/ln(2- J «) 



(5.4) 



sup 



n~ l Y / {W^2,x 1 )-E x V l (9,fi 2 ,x 1 )} 

ieJ2 



P (n 



-4/5-5^ 
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We now use a Taylor expansion of E x Vi(6, jj, 2 , x\) with respect to 6. Note 
that with A 1 = e l + mi(Xf) - mi(xi) and = - 6> - ^2(^2) = e * + 
m 1 {X\)-e + m 2 {X i 2 )- l i 2 {X i 2 ) 



V i (e,H2,x 1 ) = K 1M (x 1 ,Xi) < 



(0, if A% ^ < 0, 

0, if A% ^ > 0, 

-B\ if A i > > B l 



For <5i , c^2 > small enough, we get that uniformly for \0 — mi(xi)| < 5\ 
E x V i (9, f i 2 ,x 1 ) 

= iK^x^X^f^X^m^Xi) - fi 2 (X l 2 ) - 9 + m^)] 2 

+ P (n- 4 ^) + Op(\9 - m^Xi)! 3 )}, 

see (A3). We now apply (5.3), (5.4) and the fact that the change of an 
empirical quantile cannot be larger than the largest change of an observation. 
We use these results to analyze the update mY F (xi) when we plug into 
the iteration formula (2.2) of the backfitting estimator a choice of fi 2 = 
mip that lies in Ai n . By a direct argument, it can be shown that with 
probability tending to one the resulting value lies in an (^-neighborhood of 
mi(xi). Thus, using the above expansions, we get that, up to terms of order 
Op(n- 2 / 5 -*) with 5 3 > small enough, the resulting value for the update 
mf F (x\) is equal to the minimum of 



-YsK^ix^XiMe* + mi (X{) - m 1 (x 1 ) <0) - a] 

i=l 

1 ™ 

+ ^E^i(^' X l)^(°l Xi )[ m 2(^2)-^(^)-^ + m 1 (x 1 )] 2 . 
i=l 

The minimum of this expression is equal to 
1 n 

m^-f^ix^-^Ki^ixuXl^Iie' + m^X^-m^x^^O)-^ 
i=i 

1 " 

+/^(^)^-E K i^( x i' x i)^i^( i xl )[ m 2(^)-/i2(^)], 

1=1 

where fx ( x j) has been defined after (2.8). We now use that 
1 n 



n 

i=l 
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n 

m 1 (x 1 )-f^(x j y 1 -^2K 1M (x u Xl)f £lx (0\X l )m 1 (X{) 



n 

-2/5-5 



+ Op(n- z <°- ) 

for 5 > small enough. This shows that the minimum is equal to 

n 

/^(^)" 1 -E^^' x i)^(°i XJ )[ mi ( x i) +m 2( x 2)+^] 

i=l 

1 - 



n 

i=l 



This expansion holds uniformly for x\ G -/\ and G J^A. n . 

To complete the proof, we use the fact that, if one replaces in (2.2) or 
(2.7) the input function fi2 = "if F or [12 = , respectively, by another 

function that differs in sup- norm by an amount of order Op(n~ 2 / 5 ~^ 2 ), 
then the resulting estimator changes also at most by an amount of order 
Op(n~ 2 / 5 ~^ 2 ). In particular, if S < A2, this implies that 

sup I^H^i) - m*' BF (z;i)| = Op{n- 2 '^ & ). 



ai+Csn- 1 / 5 <x 1 <bi-Csn- 1 / 



The other statements of Proposition 2.1 can be proved by using similar 
arguments. 

5.2. Proof of Theorem 2.2. We will prove the theorem for the ordinary 
backfitting estimator. A proof for the smooth backfitting estimator follows 
along the same lines. We only give an outline of the proof. For simplicity, we 
assume that the condition (A6) holds with p = 1. Our basic argument runs as 
follows. We choose m*' BF '^ =m BF '^. By assumption, these starting values 
fulfill (A5) and (A6) (with the choice m BF = m*' BF '^ = ?n BF '^). Thus, we 
can apply Proposition 2.1 and we get that the updates ra*' BF '^ and m BF '^ 
fulfill (A7) (with the choices m*' BF = m*' BF '^ and m BF = r?i BF '^). We will 
show below that the updates m*' BF '^ of the mean regression backfitting 
estimator fulfill conditions (A5) and (A6) for all I > 1. With this fact, we 
can use an iterative argument. Suppose that we know that (A5)-(A7) hold 
for m*' BF '^ ^ and m BF ''' ^ . Then with our proof below we get that m*' BF '^ 
fulfills (A5) and (A6). By application of Proposition 2.1, we get that (A7) 
holds for m*' BF '^ and m BF '^ . Thus, m BF '^ lies in a neighborhood of m*' BF '^ 

and (A5) and (A6) also hold for m BF '^ because they are satisfied by m*' BF '^ . 
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The bound for the distance between m*' BF '^ and m^'''' adds up. Each 
application of Proposition 2.1 adds an additional term. The additional term 
increases with I. With a careful analysis of the arguments in the proof of 
Proposition 2.1, one gets that the bounds in (A5) and (A6) have to be 
multiplied by a factor C\ with a constant C* > 1. If I < Cit er logn with 
Citer > small enough, we get 

(K e\ " BF -H - *.BF,[Z] i _2/5\ 

(5.5) m- —m- = op{n 1 ). 

In the second part of the proof, we will show the asymptotic normality of 
^*,BF,[ciogn] q large enough. The minimal sufficient value of C for this 

result depends on the rate of convergence of m^' BF ' to rrij. If this rate is 
n -2 / 5 , then it can be made as small as one likes. For slower rates, one needs 
larger values of C. If the rate is fast enough, one can choose C < Citer- I n this 
case, we can apply (5.5) and we get the same asymptotic normality result 
for 7 7 l * ,BF '[ c ' itcrlosn l - This will conclude the proof of Theorem 2.2. 

We now prove that the updates m*' BF '^ fulfill the conditions (A5) and 
(A6) for all I > 1. For this purpose, we rewrite (2.7) as 

m- \Xj) — mj{Xj) 

/r r\ ~ \ . ~ *,B / \ . ~ *,C,[l]/ \ ~ *,BF 

(5.6) =m j {Xj)+mj {Xj)+m- l '{Xj)-m ' 

d „ 

- ^ [m^ BF ' [lk,J \x k )-m k (x k )]f^ Xj (x k \x j )dx k , 
where l k j = I + 1 for k < j, l k j = I for k > j, and 
m,' [Xj) = ! — — , 

m •' (xj) = ! ^ , 

' 3 /v, (•'•,) 

d / n 
k=l,^j \ i=l 

x [mf^iXl) - m k (Xi)]j (/f .{ Xj )y l 

d p 

+ ^ / [m k ' BF ' [lk,1 \x k ) -m k (x k )]f^ x .(x k \xj)dx k , 



~ *,C,[l] 



ADDITIVE QUANTILE MODELS 25 
,n,u, ( | v _ f fe\x(Q\u)Kj,h j (Xj,Uj)fx(u)dU- k 

The iteration (5.6) can be analyzed as the smooth backfitting algorithm 

in Mammen, Linton and Nielsen (1999). With rh*^ BF '^\x) = m*' BF '^(xi) + 

" ' + ^' BF ( x d) an< ^ = W-i(xi) + • • • + md(xd), we can write a full 

cycle of iterations (5.6) as 

o *,BF,[J+1] ~ *,A , ~ *,B , ~ *,C,[Z] - *,BF 

m + — m + = + + m e — m 

I rp / * *,BF,[Z] \ . 

where ^m" 4 , and m^'^ are some functions, T ni+ is an operator that 

acts on additive mean zero functions in -^2(/ e |x(0|')/x('))> an( i W = / ("i+ BF '^ 
m + )(x)/x(a;)/ e | j \'(0| a; ) We used © (not +) as subindex in fh*^ A because 
it is not the sum of ■ The operator T nj+ converges to an operator T + 
that is based on an iterative application of the linear transformations for 
the additive components gj of an additive function g + : 

d „ 

9j-*- Yj 9k(x k ) fx k \x 3 ( x k I Xj ) dx k . 

More precisely, the kernel function of T n + converges to the kernel function 
of T + , with respect to the sup-norm. 

Arguing as in the proof of Lemma 1 in Mammen, Linton and Nielsen 
(1999), one can show that T + is a positive self-adjoint operator with operator 
norm strictly less than one, ||T+|| < 1, and with ||3jm||oo < -DH^Ih f° r a 
constant D > 0. Here, Tjin is the jth additive component of T + m. This 
gives with constants < D' < 1 and D" > for n large enough 

(5.8) ||r n)+ ||<z>'. 

Furthermore, we have 

(5.9) H^jmlloo < D"||m||2, 

where T n jm is the jth additive component of T n ^ + m. Iterative application 
of (5.7) gives 

- *,BF,[Z] t *,A,[l] . 5 *,B,{1] . , *,C,[l] - *,BF . Fpl , 4 *,BF,[0] s 

m + — m+ = m + + m + + — m +-t n+ (m_|_ — m_|_), 

where T nj+ is an extension of T„ i+ to a nonzero mean function by putting 
T n ,+9 = T nt+ (g - fi g ) + fi g with fi g = f g(x)f x {x)f e \ x (0\x) dx, and 

l-i 

- *,A,[Z] V^i=.r ~ *,A 
r=0 
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l-l 

* *,B,[l] ST^'rr ~ *,B 

ml — > TL , ml , 



r=0 
Z-l 



r=0 

Using standard bounds on m*' and m *j' B , it can be verified that 

(5.10) sup \m*' A ' [l] (x j )\ = P (n- 2 / 5 ), 

(5.11) sup |m*' S ' [Z1 (^0l = Op(™~ 1/5 ), 

(5.12) sup \m*' B ' [l] (x j )\ = Op(n- 2 / 5 ), 

a,j+Cshj<Xj<bj—Cshj,l>l 

where for an additive function g + we denote by gj its jth additive compo- 
nent. 

We now argue that for a constant Ct > 

(5.13) sup \f njj f l n ~_l(?h*' BF ' [0] - m,j)(xj)\ < C T K n , 



where 



K n = SUp 

l<j<d 



I - *,BF,[0] i/ 

sup \m- — rrij\{Xj 

aj+Cshj<Xj<bj—Cshj 

, -1/5 I ; *,BF,[0] i/ 

+ n ' sup — 77^ 7 ■|(a^ ^ • 

a ^ <x. 7 ■<&j 



For a proof of this claim, one applies (5.8) and (5.9). Also, we argue that 
(5.14) sup \mf' [ \x j )\=o P (n- 2 ^). 

Xj€l t l>l 

For a proof of (5.14), we note that 

sup \mj'^\xj)\ = op(n~ 2 / 5 ). 

Xj£lj,l>l 

This follows by empirical process theory. One uses the fact that m^' BF '^ ^ — 
mfc lies in a class of functions that have second derivatives absolutely bounded 
by C^vS with £ > being arbitrarily small and constant C,c depending on £. 
This can be shown by using that the same bound applies for and m® , 
and that the kernels of the operators T + and Tj have an absolutely bounded 
second derivative [see (A9)], and then applying an iterative argument. 
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The bounds at (5.10)-(5.14) imply that m*' BF '^ fulfills (A5) uniformly 
for I > 1. Using the smoothness considerations in the previous paragraph, 
we get that m*' BF '^ fulfills (A6) uniformly for I > 1. Thus, we get by an 
iterative application of Proposition 2.1 that (5.5) holds. 

BF iter 

It remains to show the asymptotic normality result for m- ' = 
^BF,[C7 lter iogn] (7 iter l ar ge enough. Using the above arguments, we have 
for Citcr large enough that 

m*' BF ' iter (^) - mjixj) = mf' [C ^ logn] ( Xj ) + mf [C ^ logn] (x 3 ) + p(n~ 2 / 5 ). 
We argue that 

(5.15) suplmf'^Orj) - mf( Xj )\ = o P (n~ 2 / 5 ), 
l>i 

(5.16) hj 2 m^'^\xj) — > Pj(xj) as / — > oo. 
These two claims imply that 

m*' BF (xj) - mj(xj) = rhf(xj) + h 2 /3j(xj) + op(n~ 2 / 5 ). 

This expansion shows the desired asymptotic limit result by using a standard 
smoothing limit result for fh^(xj). 

We prove (5.15) and (5.16). Claim (5.15) follows from standard smoothing 
theory as in Mammen, Linton and Nielsen (1999). For a proof of (5.16), we 

define /f ( Xj ) = P*( Xj )- ZLi,^ I ^ ( x k)f^ k{Xj {x k \x 3 )dx k with pf ( Xj ) = 
0. Similarly, as in (5.7), we can write a full cycle of these iterations as 

(5.17) ^ +11 =/5® +f+$, 

where is some additive function, f3+ (x) is equal to /?[ (x\) + ■ • ■ + (x,i) 
and T + is an extension of T + defined by T + g = T + (g — fj, g ) + \i g with \i g 

defined as above. Note that we get = Xlt=o^+'^©- This expansion shows 
that 



sup \m*' B ' [l] ( Xj ) - h 2 jP f\ = p(n~ 1 / 5 ) 



(5.18) 



sup \m]' B ' [l] ( Xj ) - hpf | = o P (n~ 2 ^ 

o-j+Cshj <Xj<bj —Cshj ,1>1 

[I] 



Furthermore, we get that the term /3+ — Ylj=i i^j m 'j ( x j ) / ( u j ~ x j)Kj,hj ( x j > 
Uj) duj — ii2,K\i^'j{xj)\ converges to /j,2,kP+ as / — > oo, where (/?**, . . . , /3 a *) 
is the minimizer of 



2 

f e> x(0,x)dx. 
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This follows because the updating (5.17) is given by the first-order conditions 
of this minimization problem. Together with (5.18), this implies (5.16). 
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